Land cover classification at the Mississppi Delta¶

Background Information¶

This study will use the harmonized Sentinal/Landsat multispectral dataset to look at patterns in vegetation data. The HUC region 8 watershed extends from Missouri to the Gulf of Mexico and the lower extent near New Orleans is the focus for this analysis. The EPA ecoregion designation is Mississippi Alluvial and SE Coastal Plains. According to a publication by the Louisiana Geological Survey, the area is comprised of "...a diversity of grasses, sedges, and rushes." However, there has been a tremendous amount of human engineering of this environment.¶

Sources¶

  • USDA (2012), Response to RFI for Long-Term Agro-ecosystem Research (LTAR) Network, available online at: https://www.ars.usda.gov/ARSUserFiles/np211/LMRBProposal.pdf.
  • John Snead, Richard P. McCulloh, and Paul V. Heinrich (2019) Landforms of the Louisiana Coastal Plain, Louisiana Geological Survey, available online at: https://www.lsu.edu/lgs/publications/products/landforms_book.pdf
In [41]:
# Import needed packages
# Standard libraries
import os # for operating system manipulation
import pathlib # for managing files and directories
import pickle # enables serialization of objects for saving
import re # pattern matching in strings
import warnings # control how warnings are displayed

# Geospatial Packages
import cartopy.crs as ccrs # coordinate reference system for maps
import earthaccess # simplifies access to nasa earthdata
import earthpy as et # functions that work with raster and vector data
import geopandas as gpd # read and manipulate geo dataframes
import geoviews as gv # geospatial visualization
import hvplot.pandas # plots pandas dataframes
import hvplot.xarray # plots data arrays
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import numpy as np # numerican computation
import pandas as pd # tabular dataframes
import rioxarray as rxr # combine xarray with geospatial raster data
import rioxarray.merge as rxrmerge # merging multiple raster datasets
import seaborn as sns
from tqdm.notebook import tqdm # tracking processes with progress bar
import xarray as xr # gridded datasets
from shapely.geometry import Polygon # analyze geometric objects
from sklearn.cluster import KMeans # machine learning algorithm to group data
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_samples, silhouette_score

# Environmental Variables
os.environ["GDAL_HTTP_MAX_RETRY"] = "5" # geospatial data abstraction library for website query
os.environ["GDAL_HTTP_RETRY_DELAY"] = "1" # combined lines try website 5 times with 1 second delay between

warnings.simplefilter('ignore') # suppress warnings

Organize Information Downloads¶

In [42]:
# Setup data directories
data_dir = os.path.join(
    pathlib.Path.home(),
    'Documents',
    'eaclassprojects',
    'clusters',
    'land'
)
os.makedirs(data_dir, exist_ok=True)

data_dir
Out[42]:
'C:\\Users\\stem2\\Documents\\eaclassprojects\\clusters\\land'

Using a Decorator to Cache Results¶

In [43]:
# Define decorator
def cached(func_key, override=False):

    # save function results for retrieval and allow overwrite
    """
    A decorator to cache function results 
    
    Parameters
    ==========
    key: str
      File basename used to save pickled results
    override: bool
      When True, re-compute even if the results are already stored
    """
    # Wrap caching function
    def compute_and_cache_decorator(compute_function):
        """
        Wrap the caching function
        
        Parameters
        ==========
        compute_function: function
          The function to run and cache results
        """
        # detail usage of func_key
        def compute_and_cache(*args, **kwargs):
            """
            Perform a computation and cache, or load cached result.
            
            Parameters
            ==========
            args
              Positional arguments for the compute function
            kwargs
              Keyword arguments for the compute function
            """
            # Add an identifier from the particular function call
            if 'cache_key' in kwargs:
                key = '_'.join((func_key, kwargs['cache_key']))
            else:
                key = func_key

            path = os.path.join(
                et.io.HOME, et.io.DATA_NAME, 'jars', f'{key}.pickle')
            
            # Check if the cache exists already or override caching
            if not os.path.exists(path) or override:
                # Make jars directory if needed
                os.makedirs(os.path.dirname(path), exist_ok=True)
                
                # Run the compute function as the user did
                result = compute_function(*args, **kwargs)
                
                # Pickle the object
                with open(path, 'wb') as file:
                    pickle.dump(result, file)
            else:
                # Unpickle the object
                with open(path, 'rb') as file:
                    result = pickle.load(file)
                    
            return result
        
        return compute_and_cache
    
    return compute_and_cache_decorator

Study Site¶

In [44]:
# Read watershed boundary dataset shapefile into a GeoDataFrame
@cached('wbd_08')
def read_wbd_file(wbd_filename, huc_level, cache_key):
    # Download and unzip
    wbd_url = (
        "https://prd-tnm.s3.amazonaws.com"
        "/StagedProducts/Hydrography/WBD/HU2/Shape/"
        f"{wbd_filename}.zip")
    wbd_dir = et.data.get_data(url=wbd_url)
                  
    # Read desired data
    wbd_path = os.path.join(wbd_dir, 'Shape', f'WBDHU{huc_level}.shp')
    wbd_gdf = gpd.read_file(wbd_path, engine='pyogrio')
    return wbd_gdf

huc_level = 12 # identify HUC level
wbd_gdf = read_wbd_file(
    "WBD_08_HU2_Shape", huc_level, cache_key=f'hu{huc_level}') # Call the function using cache

delta_gdf = (
    wbd_gdf[wbd_gdf[f'huc{huc_level}']
    .isin(['080902030506'])] # filter for specific river basin
    .dissolve() # create a single polygon
)

(
    delta_gdf.to_crs(ccrs.Mercator()) # Reproject to Mercator for web mapping
    .hvplot(
        alpha=.2, fill_color='white', # set styling
        tiles='EsriImagery', crs=ccrs.Mercator()) # add background map
    .opts(title='Mississippi River Watershed, Live Oak, LA', width=600, height=300) # set plot dimensions
)
Out[44]:
In [45]:
wbd_gdf.head()
Out[45]:
tnmid metasource sourcedata sourceorig sourcefeat loaddate referenceg areaacres areasqkm states ... name hutype humod tohuc noncontrib noncontr_1 shape_Leng shape_Area ObjectID geometry
0 {8AFB1AF9-7296-4303-89DE-14CD073B859A} {511D2AC8-11BA-45FC-AB98-F69D693D4C44} Watershed Boundary Dataset (WBD) Natural Resources and Conservation Service and... None 2024-08-15 535297,540579 29441.81 119.15 LA ... Gourd Bayou-Youngs Bayou S LE,ID,DD 080500011308 0.0 0.0 NaN NaN 1 POLYGON ((-92.00021 32.53586, -91.99994 32.535...
1 {916A17A6-B4A0-4FD7-9BB8-FFD1936B15B2} {511D2AC8-11BA-45FC-AB98-F69D693D4C44} Watershed Boundary Dataset (WBD) Natural Resources and Conservation Service and... None 2024-08-15 535512 11406.67 46.16 LA ... Hams Creek S ID 080802050104 0.0 0.0 NaN NaN 2 POLYGON ((-93.37574 30.58982, -93.3747 30.5891...
2 {493C7EC1-2F1C-4B84-AFFB-6F6868A9868E} {511D2AC8-11BA-45FC-AB98-F69D693D4C44} Watershed Boundary Dataset (WBD) Natural Resources and Conservation Service and... None 2024-08-15 547190,559640 29138.21 117.92 LA ... Caney Creek-Bayou D'Arbonne S NM 080402060503 0.0 0.0 NaN NaN 3 POLYGON ((-93.07761 32.88752, -93.07784 32.887...
3 {49A3C087-B460-4F97-9D99-78CBB675248B} {511D2AC8-11BA-45FC-AB98-F69D693D4C44} Watershed Boundary Dataset (WBD) Natural Resources and Conservation Service and... None 2024-08-15 77417,78285 17759.39 71.87 AR ... L'Aigle Creek-Saline River S NM 080402020206 0.0 0.0 NaN NaN 4 POLYGON ((-92.08947 33.29383, -92.0897 33.2938...
4 {0FB41498-11EA-4AB1-AF05-E2A8E5E2E274} {511D2AC8-11BA-45FC-AB98-F69D693D4C44} Watershed Boundary Dataset (WBD) Natural Resources and Conservation Service and... None 2024-08-15 1628466 98564.62 398.88 LA ... West Cote Blanche Bay W NM 080801030800 0.0 0.0 NaN NaN 5 POLYGON ((-91.62408 29.73947, -91.62195 29.737...

5 rows × 21 columns

Access Multispectral Data¶

In [46]:
# Log in to earthaccess
earthaccess.login(persist=True)
# Search for HLS tiles
results = earthaccess.search_data(
    short_name="HLSL30",
    cloud_hosted=True,
    bounding_box=tuple(delta_gdf.total_bounds),
    temporal=("2023-05", "2023-09"),
)
# Confirm the contents
num_granules =len(results)
print(f"Number of granules found: {num_granules}")

print(results[0])
Number of granules found: 76
Collection: {'EntryTitle': 'HLS Landsat Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -90.07372506, 'Latitude': 28.80444525}, {'Longitude': -89.57473392, 'Latitude': 28.81489352}, {'Longitude': -89.30816409, 'Latitude': 29.81030157}, {'Longitude': -90.10353675, 'Latitude': 29.79400715}, {'Longitude': -90.07372506, 'Latitude': 28.80444525}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2023-05-04T16:31:32.101Z', 'EndingDateTime': '2023-05-04T16:31:55.992Z'}}
Size(MB): 87.65197467803955
Data: ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B03.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B07.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B06.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B02.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B01.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.Fmask.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.SZA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.SAA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B05.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.VAA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B10.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B09.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B11.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.B04.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T16RBT.2023124T163132.v2.0/HLS.L30.T16RBT.2023124T163132.v2.0.VZA.tif']
In [47]:
results = earthaccess.search_data(
    short_name="HLSL30",
    cloud_hosted=True,
    bounding_box=tuple(delta_gdf.total_bounds),
    temporal=("2023-05", "2023-09"),
)

print(type(results))  # Prints the type of the 'results' object
<class 'list'>

Compile Information about each Granule¶

In [48]:
# Extract and organize Earthaccess data
def get_earthaccess_links(results):
    url_re = re.compile(
        r'\.(?P<tile_id>\w+)\.\d+T\d+\.v\d\.\d\.(?P<band>[A-Za-z0-9]+)\.tif')

    # Loop through each granule
    link_rows = []  # Initialize a list to hold GeoDataFrames 
 
    url_dfs = [] # Initialize a list to hold individual file data
    for granule in tqdm(results):
        # Get granule metadata information
        info_dict = granule['umm']
        granule_id = info_dict['GranuleUR']
        datetime = pd.to_datetime(
            info_dict
            ['TemporalExtent']['RangeDateTime']['BeginningDateTime'])
        points = ( # Extract polygon coordinates
            info_dict
            ['SpatialExtent']['HorizontalSpatialDomain']['Geometry']['GPolygons'][0]
            ['Boundary']['Points'])
        geometry = Polygon( # Create a Shapely Polygon object
            [(point['Longitude'], point['Latitude']) for point in points])
        
        # Get data files associated with the granule
        files = earthaccess.open([granule])

        # Loop through each file within the granule
        for file in files:
            match = url_re.search(file.full_name)
            if match is not None:
                # Create a GeoDataFrame for the current file and append to list
                link_rows.append(
                    gpd.GeoDataFrame(
                        dict(
                            datetime=[datetime],
                            tile_id=[match.group('tile_id')], # Extract tile ID
                            band=[match.group('band')], # Extract band name
                            url=[file], # Store the file object
                            geometry=[geometry] # Store the geometry
                        ), 
                        crs="EPSG:4326" # set coordinate reference system
                    )
                )

    # Concatenate GeoDataFrames into a single gdf
    file_df = pd.concat(link_rows).reset_index(drop=True) # combine all rows
    return file_df

results = earthaccess.search_data(
    short_name="HLSL30",
    cloud_hosted=True,
    bounding_box=tuple(delta_gdf.total_bounds),
    temporal=("2023-05", "2023-09"),
)

if results:  # Check if results is not empty
    first_result = results[0]  # Get the first result
    file_df = get_earthaccess_links([first_result]) # Pass a list containing only the first result to the function
    print(file_df.head())

else:
    print("No results found.") 
  0%|          | 0/1 [00:00<?, ?it/s]
QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
                          datetime tile_id band  \
0 2023-05-04 16:31:32.101000+00:00  T16RBT  B03   
1 2023-05-04 16:31:32.101000+00:00  T16RBT  B07   
2 2023-05-04 16:31:32.101000+00:00  T16RBT  B06   
3 2023-05-04 16:31:32.101000+00:00  T16RBT  B02   
4 2023-05-04 16:31:32.101000+00:00  T16RBT  B01   

                                                 url  \
0  <File-like object HTTPFileSystem, https://data...   
1  <File-like object HTTPFileSystem, https://data...   
2  <File-like object HTTPFileSystem, https://data...   
3  <File-like object HTTPFileSystem, https://data...   
4  <File-like object HTTPFileSystem, https://data...   

                                            geometry  
0  POLYGON ((-90.07373 28.80445, -89.57473 28.814...  
1  POLYGON ((-90.07373 28.80445, -89.57473 28.814...  
2  POLYGON ((-90.07373 28.80445, -89.57473 28.814...  
3  POLYGON ((-90.07373 28.80445, -89.57473 28.814...  
4  POLYGON ((-90.07373 28.80445, -89.57473 28.814...  

Open, Crop, and Mask Data using a Cached Decorator¶

In [49]:
@cached('delta_reflectance_da_df') # Function output stored, same input gives cached output improving performance
def compute_reflectance_da(search_results, boundary_gdf):
    """
    Connect to files over VSI, crop, cloud mask, and wrangle
    
    Returns a single reflectance DataFrame 
    with all bands as columns and
    centroid coordinates and datetime as the index.
    
    Parameters
    ==========
    file_df : pd.DataFrame
        File connection and metadata (datetime, tile_id, band, and url)
    boundary_gdf : gpd.GeoDataFrame
        Boundary use to crop the data
    """
    def open_dataarray(url, boundary_proj_gdf, scale=1, masked=True):
        # Open masked DataArray
        da = rxr.open_rasterio(url, masked=masked).squeeze() * scale
        
        # Reproject boundary if needed
        if boundary_proj_gdf is None:
            boundary_proj_gdf = boundary_gdf.to_crs(da.rio.crs)
            
        # Crop
        cropped = da.rio.clip_box(*boundary_proj_gdf.total_bounds)
        return cropped
    
    def compute_quality_mask(da, mask_bits=[1, 2, 3]):
        """Mask out low quality data by bit"""
        # Unpack bits into a new axis
        bits = (
            np.unpackbits(
                da.astype(np.uint8), bitorder='little'
            ).reshape(da.shape + (-1,))
        )

        # Select the required bits and check if any are flagged
        mask = np.prod(bits[..., mask_bits]==0, axis=-1)
        return mask

    file_df = get_earthaccess_links(search_results)
    
    granule_da_rows= []
    boundary_proj_gdf = None

    # Loop through each image
    group_iter = file_df.groupby(['datetime', 'tile_id'])
    for (datetime, tile_id), granule_df in tqdm(group_iter):
        print(f'Processing granule {tile_id} {datetime}')
              
        # Open granule cloud cover
        cloud_mask_url = (
            granule_df.loc[granule_df.band=='Fmask', 'url']
            .values[0])
        cloud_mask_cropped_da = open_dataarray(cloud_mask_url, boundary_proj_gdf, masked=False)

        # Compute cloud mask
        cloud_mask = compute_quality_mask(cloud_mask_cropped_da)

        # Loop through each spectral band
        da_list = []
        df_list = []
        for i, row in granule_df.iterrows():
            if row.band.startswith('B'):
                # Open, crop, and mask the band
                band_cropped = open_dataarray(
                    row.url, boundary_proj_gdf, scale=0.0001)
                band_cropped.name = row.band
                # Add the DataArray to the metadata DataFrame row
                row['da'] = band_cropped.where(cloud_mask)
                granule_da_rows.append(row.to_frame().T)
    
    # Reassemble the metadata DataFrame
    return pd.concat(granule_da_rows)

reflectance_da_df = compute_reflectance_da(results, delta_gdf)

reflectance_da_df.head()
Out[49]:
datetime tile_id band url geometry da
45 2024-06-07 16:31:11.509000+00:00 T15RYN B04 <File-like object HTTPFileSystem, https://data... POLYGON ((-89.82661214 28.80213717, -89.795837... [[<xarray.DataArray 'B04' ()> Size: 4B\narray(...
48 2024-06-07 16:31:11.509000+00:00 T15RYN B06 <File-like object HTTPFileSystem, https://data... POLYGON ((-89.82661214 28.80213717, -89.795837... [[<xarray.DataArray 'B06' ()> Size: 4B\narray(...
49 2024-06-07 16:31:11.509000+00:00 T15RYN B03 <File-like object HTTPFileSystem, https://data... POLYGON ((-89.82661214 28.80213717, -89.795837... [[<xarray.DataArray 'B03' ()> Size: 4B\narray(...
50 2024-06-07 16:31:11.509000+00:00 T15RYN B07 <File-like object HTTPFileSystem, https://data... POLYGON ((-89.82661214 28.80213717, -89.795837... [[<xarray.DataArray 'B07' ()> Size: 4B\narray(...
51 2024-06-07 16:31:11.509000+00:00 T15RYN B02 <File-like object HTTPFileSystem, https://data... POLYGON ((-89.82661214 28.80213717, -89.795837... [[<xarray.DataArray 'B02' ()> Size: 4B\narray(...

Merge and Composite Data¶

In [51]:
@cached('delta_reflectance_da') # Function output stored, same input gives cached output improving performance
def merge_and_composite_arrays(granule_da_df):
    # Merge and composite and image for each band
    df_list = [] # List to store DataFrames
    da_list = [] # List to store the composited DataArrays for each band
    for band, band_df in tqdm(granule_da_df.groupby('band')): # Iterate through each band
        merged_das = [] # List to store merged DAs for each date within the current band

        for datetime, date_df in tqdm(band_df.groupby('datetime')): # Iterate through each date within the current band
            # Merge granules for each date
            merged_da = rxrmerge.merge_arrays(list(date_df.da))
            # Mask negative values
            merged_da = merged_da.where(merged_da>0)
            # Add the merged DA for the current date to the list
            merged_das.append(merged_da)
            
        # Composite images across dates with median reflectance value for each pixel reducing the effect of cloud cover.
        composite_da = xr.concat(merged_das, dim='datetime').median('datetime')
        composite_da['band'] = int(band[1:]) # Converts band name to integer 
        composite_da.name = 'reflectance' # Sets the name of the DataArray
        da_list.append(composite_da) # Add the composite DA for the current band to the list

    # Concatenate the composite DataArrays for all bands into a single DataArray   
    return xr.concat(da_list, dim='band') 

reflectance_da = merge_and_composite_arrays(reflectance_da_df) # Call the function to process the reflectance data
reflectance_da[0] # Display the first resulting reflectance DataArray
Out[51]:
<xarray.DataArray 'reflectance' (y: 556, x: 624)> Size: 1MB
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
  * x            (x) float64 5kB 7.926e+05 7.926e+05 ... 8.112e+05 8.113e+05
  * y            (y) float64 4kB 3.304e+06 3.304e+06 ... 3.287e+06 3.287e+06
    band         int64 8B 1
    spatial_ref  int64 8B 0
xarray.DataArray
'reflectance'
  • y: 556
  • x: 624
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
    array([[nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           ...,
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
    • x
      (x)
      float64
      7.926e+05 7.926e+05 ... 8.113e+05
      array([792568.062907, 792598.062907, 792628.062907, ..., 811198.062907,
             811228.062907, 811258.062907])
    • y
      (y)
      float64
      3.304e+06 3.304e+06 ... 3.287e+06
      array([3303783.495888, 3303753.495888, 3303723.495888, ..., 3287193.495888,
             3287163.495888, 3287133.495888])
    • band
      ()
      int64
      1
      array(1)
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      PROJCS["WGS 84 / UTM zone 15N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-93],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32615"]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      WGS 84
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      WGS 84 / UTM zone 15N
      grid_mapping_name :
      transverse_mercator
      latitude_of_projection_origin :
      0.0
      longitude_of_central_meridian :
      -93.0
      false_easting :
      500000.0
      false_northing :
      0.0
      scale_factor_at_central_meridian :
      0.9996
      spatial_ref :
      PROJCS["WGS 84 / UTM zone 15N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-93],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32615"]]
      GeoTransform :
      792553.062907047 30.0 0.0 3303798.4958882914 0.0 -30.0
      array(0)
    • x
      PandasIndex
      PandasIndex(Index([792568.062907047, 792598.062907047, 792628.062907047, 792658.062907047,
             792688.062907047, 792718.062907047, 792748.062907047, 792778.062907047,
             792808.062907047, 792838.062907047,
             ...
             810988.062907047, 811018.062907047, 811048.062907047, 811078.062907047,
             811108.062907047, 811138.062907047, 811168.062907047, 811198.062907047,
             811228.062907047, 811258.062907047],
            dtype='float64', name='x', length=624))
    • y
      PandasIndex
      PandasIndex(Index([3303783.4958882914, 3303753.4958882914, 3303723.4958882914,
             3303693.4958882914, 3303663.4958882914, 3303633.4958882914,
             3303603.4958882914, 3303573.4958882914, 3303543.4958882914,
             3303513.4958882914,
             ...
             3287403.4958882914, 3287373.4958882914, 3287343.4958882914,
             3287313.4958882914, 3287283.4958882914, 3287253.4958882914,
             3287223.4958882914, 3287193.4958882914, 3287163.4958882914,
             3287133.4958882914],
            dtype='float64', name='y', length=556))

Analyze using K-MEANS¶

In [58]:
### how many different types of vegetation are there?
# Convert spectral DataArray to DataFrame
model_df = reflectance_da.to_dataframe().reflectance.unstack('band')
model_df = model_df.drop(columns=[10, 11]).dropna()

model_df.head()
Out[58]:
band 1 2 3 4 5 6 7 9
y x
3.303783e+06 810148.062907 0.01560 0.0225 0.0409 0.0366 0.0478 0.0281 0.0181 0.0006
810178.062907 0.01895 0.0256 0.0396 0.0413 0.0426 0.0284 0.0220 0.0006
810208.062907 0.01915 0.0246 0.0387 0.0377 0.0384 0.0273 0.0236 0.0007
810238.062907 0.02040 0.0247 0.0440 0.0445 0.0629 0.0418 0.0266 0.0007
810268.062907 0.01815 0.0245 0.0437 0.0444 0.0618 0.0397 0.0259 0.0008
In [59]:
# Running the fit and predict functions at the same time.
# We can do this since we don't have target data.
prediction = KMeans(n_clusters=6).fit_predict(model_df.values)

# Add the predicted values back to the model DataFrame
model_df['clusters'] = prediction
model_df
Out[59]:
band 1 2 3 4 5 6 7 9 clusters
y x
3.303783e+06 810148.062907 0.01560 0.0225 0.0409 0.0366 0.0478 0.0281 0.0181 0.0006 0
810178.062907 0.01895 0.0256 0.0396 0.0413 0.0426 0.0284 0.0220 0.0006 0
810208.062907 0.01915 0.0246 0.0387 0.0377 0.0384 0.0273 0.0236 0.0007 0
810238.062907 0.02040 0.0247 0.0440 0.0445 0.0629 0.0418 0.0266 0.0007 4
810268.062907 0.01815 0.0245 0.0437 0.0444 0.0618 0.0397 0.0259 0.0008 4
... ... ... ... ... ... ... ... ... ... ...
3.287163e+06 793798.062907 0.02650 0.0345 0.0548 0.0427 0.0218 0.0098 0.0074 0.0007 0
793828.062907 0.02790 0.0351 0.0549 0.0439 0.0221 0.0104 0.0076 0.0008 0
793858.062907 0.02580 0.0331 0.0534 0.0419 0.0194 0.0080 0.0059 0.0009 0
793888.062907 0.02570 0.0326 0.0521 0.0402 0.0182 0.0064 0.0046 0.0007 0
793918.062907 0.02550 0.0340 0.0541 0.0423 0.0199 0.0083 0.0060 0.0007 0

318248 rows × 9 columns

Plot Reflectance¶

In [60]:
# Select bands from reflectance data array
rgb = reflectance_da.sel(band=[4, 3, 2]) # band 4=red, 3=green and 2=blue
rgb_uint8 = (rgb * 255).astype(np.uint8).where(rgb!=np.nan) # convert to integers (0-255)
rgb_bright = rgb_uint8 * 10 # increase brightness 
rgb_sat = rgb_bright.where(rgb_bright < 255, 255) # no pixel exceeds 255

# Create a composite RGB image plot in square
(
    rgb_sat.hvplot.rgb( 
        x='x', y='y', bands='band',
        data_aspect=1,
        xaxis=None, yaxis=None)
    + # Overlay cluster data
    model_df.clusters.to_xarray().sortby(['x', 'y']).hvplot(
        cmap="accent", aspect='equal', xaxis=None, yaxis=None) 
)
Out[60]:

Landcover Analysis using K-Means Clusters from Sentinel/Landsat Multispectral Data¶

Land Cover Interpretation based on Spectral Data¶

According to America’s Watershed Initiative, the wetlands in the lower Mississippi region being studied have been depleted annually since the 1930s and excess nutrient discharges have created "dead zones" or areas of low oxygen where life struggles to exist. [1] The K-means cluster analysis in the above images shows 6 clusters of land forms spread out over the region. Some, like clusters, 4 and 0 have areas where they are concentrated, but most landform clusters are highly dispersed. In a study by Roy et al, their " ...data suggests that single transition land loss is caused by wave-edge erosion, whereas multiple transition land loss is caused by subsidence." [2] Given that most of the landform clusters are fragmented, I would expect clusters 4 and 0 to be mostly aquatic and the remainder to be composed of grasses, sedges and rushes.¶

Source¶

  1. America’s Watershed Initiative Report Card for the Mississippi River. Dec. 2015. Available online at: https://americaswatershed.org/wp-content/uploads/2015/12/Mississippi-River-Report-Card-Methods-v10.1.pdf
  2. Samapriya Roy et al. 2020. Spatial and temporal patterns of land loss in the Lower Mississippi River Delta from 1983 to 2016. Remote Sensing of Environment 250 (2020) 112046. Available online at: https://www.sciencedirect.com/science/article/abs/pii/S0034425720304168